TopX: efficient and versatile top-k query processing for text, structured, and semistructured data

نویسنده

  • Martin Theobald
چکیده

TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous score aggregation function with respect to a multidimensional query. The main contributions of the thesis unfold into four main points, confirmed by previous publications at international conferences or workshops: • Top-k query processing with probabilistic guarantees. • Index-access optimized top-k query processing. • Dynamic and self-tuning, incremental query expansion for top-k query processing. • Efficient support for ranked XML retrieval and full-text search. Our experiments demonstrate the viability and improved efficiency of our approach compared to existing related work for a broad variety of retrieval scenarios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TopX - Efficient and Versatile Top-k Query Process-ing for Text, Semistructured, and Structured Data

This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and structured data. Residing at the very synapse of database (DB) engineering and information retrieval (IR), it integrates efficient scheduling algorithms for top-k-style ranked retrieval with powerful scoring model...

متن کامل

An Efficient and Versatile Query Engine for TopX Search

This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The difficulties in applying the existing ...

متن کامل

Guest Editors Introduction: Special Section on Keyword Search on Structured Data

WITH the prevalence of Web search engines, keyword search has become the most popular way for users to retrieve information from text documents. On the other hand, there is an enormous amount of valuable information stored in structured form (relational or semistructured) in Internet, intranet, and enterprise databases. To query such data sources, users traditionally depended on specialized app...

متن کامل

Similarity Measures for Query Expansion in TopX

TopX is a top-k retrieval engine for text and XML data. Unlike some other engines, TopX includes an ontology. This ontology allows TopX to use techniques like word sense disambiguation and query expansion, to search for words similar to the original query terms. These techniques allow finding data items which would be ignored for the original source query, due to missing of words similar to the...

متن کامل

TopX and XXL at INEX 2005

We participated with two different and independent search engines in this year’s INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this paper focuses on the design principles, scoring, query evaluation and results of TopX. We shortly discuss the results with XXL afterwards. 1 TopX – System Overview Our query processing methods are based on preco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006